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Evolutionary dynamics and patterns of molecular evolution are strongly influenced by 
selection on linked regions of the genome, but our quantitative understanding of these 
effects remains incomplete. Recent work has focused on predicting the distribution of 
fitness within an evolving population, and this forms the basis for several methods that 
leverage the fitness distribution to predict the patterns of genetic diversity when se- 
lection is strong. However, in weakly selected populations random fluctuations due to 
genetic drift are more severe, and neither the distribution of fitness nor the sequence 
diversity within the population are well understood. Here, we briefly review the moti- 
vations behind the fitness-distribution picture, and summarize the general approaches 
that have been used to analyze this distribution in the strong-selection regime. We 
then extend these approaches to the case of weak selection, by outlining a pcrturbative 
treatment of selection at a large number of linked sites. This allows us to quantify the 
stochastic behavior of the fitness distribution and yields exact analytical predictions for 
the sequence diversity and substitution rate in the limit that selection is weak. 



A central goal of modern population genetics is to pre- 
dict the diversity and fate of DNA sequences within a 
population, taking into account the joint effects of muta- 
tion, recombination, natural selection, and demography 
at the sequence level. Diversity is a fundamental fea- 
ture on these genomic scales, since the typical mutation 
rates in most organisms are sufficiently large that a num- 
ber of sequence variants are likely to coexist within the 



population at any given time ( 


Begun et al.| 2007 Kre- 


itman 


1983 


Lewontin and Hubby[ 1966| Nelson et al.| 


2012 




Nik-Zinal et al. 


2012 


Rambaut et al. 


2008 


). It is 



therefore imperative that our models of sequence evolu- 
tion should be able to describe a large number of variants 
at disparate sites within the genome, possibly with differ- 
ent effects on the reproductive fitness of each individual 



(Hahn 2008) 



This picture of extensive diversity at the sequence level 
stands in contrast to the large body of population ge- 
netics theory developed during the first half of the 20th 
century, which typically focused on the fate of a single 
mutant allele (relative to the wildtype) at a single ge- 
netic locus. Numerous mathematical models have been 
proposed, even for this highly simplified scenario, which 
correspond to different underlying assumptions about the 
mechanisms and stochasticity of natural selection, the re- 



productive lifecycle of the organism, and so on ( Ewens 



2004). In large populations, many of the differences be- 
tween these various models become negligible, and an ele- 
gant theoretical description of the two-allele, single-locus 
system can be obtained from the standard diffusion limit 



(Kimura 1955 1. The frequency / of a mutant allele with 



fitness effect s in a population of size N is assumed to 



satisfy the stochastic differential equation 



(1) 



selection 



genetic drift 



where 77 (t) is a stochastic noise term which will be de- 
fined in more detail below. Equation ([T]) relates the rate 
of change in / to the deterministic action of selection 
and the random effects of genetic drift, and it is formally 
equivalent to the diffusion equation for the probability 
distribution of / typically cited in the population genet- 
ics hterature (IKorolev et al.l [2010]). Although the full 



solution to Eq. (1) is quite complicated (Kimura 1955 



Song and Steinriicken 2012), this difi'usion model is sim- 



ple enough to admit a number of useful and exact results, 
including the well-known formula for the probability of 
fixation of a new mutant 
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in the limit of low mutation rate /i. The historical impact 
of this diffusion model cannot be overstated, and these 
simple results played a large role in illuminating both the 
qualitative and quantitative effects of genetic drift arising 
from the finite size of the population. However, extending 
these single-locus results to an explicitly sequence-based 
setting proves to be quite challenging. 

In principle, one can treat the entire genome as a sin- 
gle locus with each possible genotype represented by a 



2 



unique allele. A genome of length L would therefore re- 
quire 2^ separate alleles and a corresponding system of 
diffusion equations relating the 2-^ — 1 independent al- 
lele frequencies. This clearly becomes unwieldy for large 
genomes since the number of alleles grows exponentially 
with L, and the sparse mutational connectivity between 
the different sequences and their varying fitnesses re- 
moves much of the desired symmetry from the problem 
(Ethier and Kurtz 1987). Even for a genome with just 



L = 2 sites, exact solutions can only be found for a few 
special cases, and one must often resort to numerical cal- 



culations (Barton and Etheridge 



simulations (Hill and Robertson 



1966). 



2004) or Monte-Carlo 



A popular alternative approach is to treat each site in 
the genome as a separate locus and assume some sort 
of quasi-independent evolution among the various loci, 
so that the single-locus model in Eq. ([T]) applies to 



the marginal nucleotide frequencies at each site (Sawyer 



and Hartl 1992). This independent-sites approximation, 
which is exact in the limit of infinite recombination, re- 
flects a historical perception of linkage as an infrequent 
and generally small correction to an otherwise freely- 
recombining set of loci, as is often the case for a quantita- 
tive trait with genetic contributions from several distant 



sites ( 


Barton and Turelli 


1991 


Falconer 


1960 


Neher 


and Shraiman 2011b 


). But given the typical recombina- 



tion rates in most organisms, this assumption is likely to 
break down on local genomic scales, and effectively asex- 
ual selection on particular haplotype blocks may be a 



more accurate description (Franklin and Lewontin 1970 
Slatkin 1972). Moreover, it has been shown that selec- 
tion within these linked regions leads to large deviations 
from the predictions assuming independent evolution be- 
tween the various sites, even after adjusting for possible 



reductions in the effective population size (Bustamante 



itman[ [20021 [Good and Desai[ [2012) [Messer and Petrov 
2012). Correctly accounting for the effects of selection 



et al.[ 2001[[Charlesworth et al. 1993[ Comeron and Kre 



on local genomic scales remains one of the major out- 
standing problems in population genetics, and is a nec- 
essary prerequisite if we wish to take full advantage of 
the increasing availability of DNA sequence data in order 
to make inferences about the evolutionary forces acting 



within a population (Pool et al. 20101 



Recent advances in this area have employed a third 
approach — situated somewhere between the genotypes- 
as-alleles and sites-as-loci schemes — in which the distri- 
bution of fitnesses in the population plays a central role 



(Desai and Fisher 


12007 Goyal et al. 


2012 Haigh 1978 


Hallatschek 2011 


Neher et al. 2010 


Ohta and Kimura 


1973 Park and K 


rug 2007 Rouzine et al. 2003; Tsim-| 


ring et al. 19961. 


Although the fitness distribution may 



seem to be rather tangential to the sequence-oriented 
questions introduced above, this quantity turns out to 
play an important role in mediating the effects of linked 
selection within the population, and several promising 



methods predict the behavior of individual sequences 
based on their interactions with this population- wide dis- 



tribution ( 


Good et al. 




2012 


Hudson and Kaplan 


1994 


Neher and Shraiman| | 


2011a 


O'Fallon et al. 2010| 


Wal-| 


czak et al. 2012 Zeng 


and Charlesworth| 2011). Instead 



of tracking the frequencies of all possible genotypes or 
just the marginal frequencies at each site, this approach 
requires an explicit model for the frequency of individu- 
als at each possible fitness, otherwise known as a fitness- 
class. Here too, the interactions between mutation, re- 
combination, drift, and selection can be quite complex, 
and significant progress has been made only in the case 
where genetic drift is negligible compared to these other 
evolutionary forces. This can often be a reasonable ap- 
proximation in many populations, since the effects of ge- 
netic drift are typically less severe for the fitness classes 
than for the frequencies of the underlying genotypes. 

Nevertheless, even in this fitness-class picture the ef- 
fects of genetic drift cannot be excised completely, since 
they play a crucial role in the high-fitness "nose" of the 
fitness distribution that often controls the behavior in 



the rest of the population (|Brunet et al. 2008| 


Desai and 


Fisher 


2007 


Goyal etal. 2012| 


Hallatschek 


2011 Neher 


and Shraiman 


2012 


). Various ad- hoc methods have been 



devised to account for this drift-dominated nose and its 
relation to the deterministic behavior in the bulk popu- 
lation, which yield accurate predictions for simple quan- 
tities such as the average rate of adaptation and the fix- 
ation probability of new mutations. Yet because of their 
ad-hoc nature, it is not entirely clear when these approx- 
imations are likely to be valid, or whether they remain 
appropriate for more complicated quantities of interest. 
Furthermore, in populations with weaker selection this 
separation between the drift-dominated nose and the de- 
terministic bulk starts to break down, and the random 
nature of genetic drift becomes important throughout the 
entire fitness distribution. 

In the present work, we follow an approach that is or- 
thogonal to both the weak-drift limit of this fitness-class 
description as well as the weak-mutation limit implicit in 
the standard single- locus treatment. Rather, we seek a 
fitness-class description for a regime with weak selection 
at a large number of linked sites. Suitably defined, the 
neutral limit of the population "fitness distribution" is 
exactly solvable, and the corrections in the presence of 
selection can be calculated order by order as a perturba- 
tion series in powers of the selection strength. The re- 
sulting expressions may have relevance to sequence data 
obtained from natural populations [particularly in the 



context of the nearly-neutral theory of evolution ( Ohta 



1992 )] , but their primary value is qualitative. The zeroth- 
order neutral description offers a valuable window into 
the stochastic aspects of the population fitness distribu- 
tion in the absence of the complicating effects of selec- 
tion, while the higher-order terms give the exact correc- 
tions from interference at a large number of linked sites 



3 



and help illuminate the previously obscure transition to 
neutrality. The exact nature of these selective corrections 
provides a valuable check on a number of common heuris- 
tic assumptions in the literature, which should agree with 
our asymptotic results when selection becomes weak. 



I. FITNESS CLASSES AND THE POPULATION 
FITNESS DISTRIBUTION 

The distribution of fitnesses within the population is 
itself a random object which changes in time and reflects 
the inherent stochasticity of the evolutionary process. 
Two populations with the same genetic composition and 
the same set of available mutations will typically pos- 
sess different fitness distributions after evolving indepen- 
dently for the same amount of time, although these distri- 
butions will be related in some statistical sense. Like the 
stochastic frequency of a single mutant allele discussed 
above, the statistical properties of the fitness distribu- 
tion can be described by a generalization of the diffusion 
model in Eq. ([T]) that makes both the large population 
and long genome limits explicit. We consider a popula- 
tion of N haploid individuals that acquire new mutations 
at a total rate U per generation. We assume that these 
mutations occur over a large number of loci, each with 
relatively small contributions to the total fitness, so that 
a mutation of effect s arising in an individual with (log) 
fitness X increases its fitness to X + s. Furthermore, we 
assume that the number of loci is sufficiently large, and 



epistasis sufficiently weak, that the set of available muta- 
tions can be approximated by a continuous distribution 
of fitness effects p{s) that remains constant throughout 
the relevant time interval. 

The random arrival of new mutations and the effects 
of genetic drift are treated by a continuous-time stochas- 
tic model similar to the one introduced in IHallatschekl 
( 2011[ ). Let f{X,t) denote the relative frequency of in- 
dividuals with absolute fitness X at time i, normalized 
so that J dX f{X,t) = 1. In some infinitesimal time 
dt, these frequencies are incremented according to the 
stochastic update rule 



f{X, t + St) fix, t) + Xf{X, t)5t 

+ U [ ds [f{X -s,t)- f{X, t)] 6t 

v{x,t). 



(4) 



f{X,t)St 



N 



where ri{X, t) denotes a set of independent Gaussian noise 



terms with zero mean and unit variance ( Gardiner 1985 1 , 



and the constant of proportionality is chosen to satisfy 
the population size constraint 



j dX f{X, t + St) = l. 



(5) 



This yields a familiar Langevin equation for the fitness 
distribution 



dfjX) 
dt 



X - X{t)] fix) + U ds p{s) [fix - s) - fix)] + / dX' [Six' -X)- fix)] 



clccti( 



mutation 



fjx') 

N 



vix') 



(6) 



genetic drift 



where Xit) — J dX X fiX,t) is the mean fitness of the 
population (see Fig. [T]). Like the diffusion approxima- 
tion at a single locus, this stochastic model is thought to 
describe the universal behavior that emerges in the limit 
that N ^ oo and L ^ oo, while the per-site mutation 
rate p and the relevant fitnesses X tend to zero in such 
a way that the scaled quantities NU = NLfj, and NX 
completely determine the dynamics. This scaling behav- 
ior provides an important check on our our intuition (as 
well as our algebra) , since it implies that any effects that 
depend on 1/N, X, or Np. alone are competely negligi- 
ble in this model unless we explicitly relax one of these 
assumptions (e.g., the finite site effects in Appendix A). 

Two of the defining features of this stochastic model 
arise from the population size constraint in Eq. ([S]) that 
connects Eqs. Q and (|6|: the resulting selection term 
becomes a nonlinear function of /(X, t), and the previ- 



I 

ously simple noise terms acquire a complicated correla- 
tion structure. Such features are inherent in any model 
that imposes a population size constraint in this manner. 
The only fitnesses that matter are the relative fitnesses 
X = X — Xit), which depend not just on the properties 
of a particular DNA sequence, but also on the global be- 
havior of all the sequences in the population. Moreover, 
the action of genetic drift is correlated among the various 
fitness classes in order to respect the constant population 
size. 

Yet despite the complex correlation structure of this 
drift term, we have constructed our stochastic model so 
that its average effect vanishes at any particular time. 
Thus, we are led to examine the average profile ifiX, t)), 
which represents the expected value of the fitness dis- 
tribution averaged over many independent populations. 
Taking the expectation of both sides of Eq. ^ , we find 
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Fitness, X 



FIG. 1 A schematic depiction of the population fitness distri- 
bution, f{X,t), which is obtained by grouping together geno- 
types with the same absolute fitness X. Important features of 
this distribution include the mean fitness X and the standard 
deviation a, which is proportional to the typical fitness differ- 
ence between two individuals in the population. We have also 
highlighted the high-fitness "nose" of the distribution, where 
genetic drift continues to dominate even in extremely large 
populations. 



that this average profile {f{X,t)) is governed by the de- 
terministic differential equation 



djfjX)) 
dt 



X{f{X))- / dX'X'{f{X)f{X')) 



(7) 



-f dsp{s)[{f{X-s))-{f{X))] 



However, this moment equation does not close: the non- 
linear selection term in Eq. ([6| implies that the future be- 
havior of (/(X)) does not just depend upon its own value 
in the present, but also on the two-point correlation func- 
tion {f{X)f{X')). The time evolution of this two-point 
correlation function will in turn depend on the three- 
point correlation function, and so on. One must there- 
fore solve an infinite hierarchy of these moment equations 
in order to obtain predictions for the mean behavior. A 
similar hierarchy arises for the single-locus diffusion in 
Eq. ([T]), but in that case the simplicity of the drift term 
permits an exact solution. In contrast, the lack of clo- 
sure among the moments of the fitness distribution is 
arguably the primary obstacle for a quantitative descrip- 
tion of large numbers of interfering mutations, and many 
statistical properties of the fitness distribution remain 
unknown as a result. 

Much of the existing work in this field has essentially 
focused on various ways to approximate this correlated 
selection term. This is often achieved through some sort 
of approximate factorization of the form 

{{X-X)f{X))^{X-{X)){f{X)) (8) 

so that the statistical aspects of the nonlinearity are 



marginalizecj^ Given that the magnitude of genetic drift 
is proportional to one regime where this approxi- 

mation appears quite naturally is in the strong-selection 
limit N\X — X\ 3> 1, when the genetic drift term can be 
neglected in Eq. ^ and all higher correlations vanish. 
The canonical example of such a regime is the deleterious 
mutation-selection balance attained under strong puri- 



fying selection (Haigh 1978), which has been intensely 



studied in the context of Muller's ratchet (Etheridge 


et a 


[ |2007 Gessler 1995 |Gordo and Charlesworth 


2000 


|Higgs and Woodcock 


11995 


|Jain| |2008i ,Muller 


1964 


|Neher and Shraiman 


2012 ; 


stephan et al.l 1993 


Waxman and Loewe 


2010 


and background selection 


( Charlesworth et al.| 


1993 


Gordo et al. 2002 Hudson 


and Kaplan 




1994| Nicolaisen and Desail |2012 |Walczak 


et al. 


2012 


. 


Neher and Shraiman 


(2012) have demon- 



strated that the deterministic limit becomes exact in this 
particular case when selection is infinitely strong. How- 
ever, if beneficial mutations are present, or if some of 
the deleterious mutations are weakly selected, then even 
in this extreme limit the factorization in Eq. Q does 
not hold for all X, since it starts to break down near 
the high- fitness "nose" of the distribution (see Fig. [I]), 
where a relatively small number of individuals have an 
outsized chance of taking over the population. Thus, 
in this strong-selection limit one often speaks of a divi- 
sion of the population into a drift-dominated nose (where 
stochasticity is extremely important) and a deterministic 
bulk where Eq. ^ holds. When the disinction between 
these regions is sufficiently sharp, a number of highly suc- 
cessful (although somewhat ad-hoc) approximations have 
been developed to treat the stochasticity in the nose and 
to self-consistently match this behavior with the deter- 



ministic bulk of the population (|Desai and Fisher 


2007 


Goyal et al.l |2012| |Neher and Shraiman 


2012 


Rouzinel 


et al. 2003 


Tsimring et al. 1996). Several alternative 



approaches are based on a modification of the stochas- 
tic dynamics in Eq. ([6|, which is chosen in a particular 
way so that the nonlinearity in the selection term van- 



( Fisher 


2012 


Hallatschek 


2011 



models may be more appropriate when the boundary be- 
tween the stochastic nose and the deterministic bulk is 
less pronounced, but their relation (and relevance) to the 
standard evolutionary model in Eq. ([6]) must be justified 
on an ad-hoc basis. 

These methods are by far the most promising can- 
didates for describing the evolutionary dynamics in 
a strong-selection regime relevant to many laboratory 
evolution experiments, microbial populations, or other 
rapidly adapting organisms. Yet from a purely theoret- 



^ Strictly speaking, this sort of approximation is often more appro- 
priate when this simple ensemble average is replaced with some 
other averaging scheme (see below) or an alternative measure of 
the "typical" behavior l lFisherl l2012| 
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ical standpoint, they suffer from a major sfiortcoming 
in that they attempt to describe a parameter regime for 
which no exact asymptotic description has been found. 
Although these methods were devised to approximate 
this asymptotic behavior, their correctness (apart from 
self-consistency) can only be validated by numerical com- 
parisons to Monte-Carlo simulations of Eq. Q for partic- 
ular parameter values. This can make it difhcult to test 
the individual assumptions that enter into these approxi- 
mations or to compare different approximation methods, 
and it offers little direct information about which quan- 
tities or parameter regimes fall outside their domain of 
validity. On a more practical level, there may be many 
populations that are dominated by a large number of 
weakly selected mutations where these strong-selection 
methods do not apply. In this case, few quantitative de- 
scriptions exist apart from assuming strict neutrality, and 
our knowledge of the relevant processes in this regime is 
extremeley limited. 

Thus, while previous approaches have attempted to 
reconcile the joint effects of selection and drift by mostly 
neglecting the latter, our approach here will be exactly 
the opposite. Rather than focus on those regimes where 
the selection term can be factored like Eq. (|8|, we con- 
sider the weak-selection limit where this selection term 
can be neglected entirely, or at least treated as a small 
perturbation. Unlike the strong-selection limit, the ze- 
roth order solution in this nearly-neutral regime can be 
treated exactly and the full statistical behavior can be 
elucidated, which leads to a natural (and similarly ex- 
act) perturbation expansion in the presence of selection. 



II. THE NEUTRAL LIMIT 

There are a variety of ways we could define the neutral 
limit of Eq. but we are interested in one which does 
not lead to a trivial description of the resulting "fitness 
distribution." For example, there is a naive limit in which 
the fitness effects of all new mutations have s = 0, which 
implies that the entire population is confined to a single 
"fitness class" with fitness X = for all time, i.e. 



Langevin equation 



fix) = 6{X) 



(9) 



However, we can maintain much more of the interest- 
ing multi-locus behavior by ignoring the absolute fitness 
of each individual for the moment and concentrating in- 
stead on the number of mutations k that each individ- 
ual possesses. In this limit, a population-wide "mutation 
number distribution" /(fc, t) emerges in the same way 
that a fitness distribution f{X,t) arises from Eq. 
The stochastic dynamics in this case are governed by the 



dfjk) 
dt 



Uf{k-1) + Uf{k) 



m] \l^vik\t) 



(10) 



These dynamics are similar to the charge-ladder model 



introduced by Ohta and Kimura ( 1973 ) , which was ini 



tially created to model the early electrophoresis measure- 
ments of allelic diversity. This model later played an 
important role in the development of the neutral coa- 



lescent, which has since largely superseded it (Kingman 



1976 1982 2000 Moran 19751. Our approach below will 



have much in common with this standard neutral result, 
although some quantities are more convenient to calcu- 
late in one framework than the other. However, our de- 
scription in terms of fitness classes will lead to a natural 
generalization in the presence of selection, which is dif- 
ficult to incorporate into the standard coalescent model 



(Neuhauser and Krone 19971. 



The stochastic dynamics in Eq. (10 1 are free of the 
nonlinearities that plagued our earlier analysis of Eq. (|6| , 
and the resulting equation for the average profile (/(fc, t)) 
closes: 



difjk)) 
dt 



u{f{k~i))^u{fm 



(11) 



This differential equation is straightforward to solve, and 
under the assumption that all individuals start with zero 
mutations at time t — 0, we find that 



ki 



-ut 



(12) 



Thus, the mean of this distribution accumulates muta- 
tions at a constant rate C/, which agrees with the stan- 
dard calculation that assumes that each neutral mutation 
fixes independently. As i — > oo, the width of this distri- 
bution grows larger and larger, and in order to conserve 
probability, (/(fc, t)) approaches the trivial solution 



lim (/(fc,t)) =0. 



(13) 



A similar observation was made previously in the context 
of the charge-ladder model, which reflects the fact that 



this earlier model and the one defined by Eq. (10 1 have 
no true stationary distribution. Intuitively, this degen- 
erate behavior is an artifact of the averaging process we 
used in order to calculate (/(fc,t)). While the average 
rate of mutation accumulation is simply the mutation 
rate U , the actual rate for any paticular population will 
tend to fluctuate around this value, and the location k[t) 
will become increasingly uncertain with time. By calcu- 
lating the average {f{k,t)) as t — t- oo, we are effectively 
averaging many independent distributions whose centers 
are distributed across a large region of /c, and hence the 



average number of individuals at any particular k tends 
to zero. This line of reasoning is not specific to the neu- 
tral limit considered in this section, but is in fact a gen- 
eral property of any fitness distribution whose absolute 
location is subject to stochastic fluctuations. 

In all of these cases, the average distribution at long 
times is a poor summary of the typical distribution found 
in a random population. For example, while the width 



of the average distribution in Eq. (12 1 increases without 



bound, a simple argument from neutral coalescent theory 
shows the average width of the population fitness distri- 
bution has a finite extent as i ^ oo. A random pair of 
individuals in the population will typically share a com- 
mon ancestor T2 ^ N generations ago, so the difference 
between the number of mutations accumulated since the 
common ancestor is on the order of NU . We can see this 
in our current framework by simply measuring the num- 
ber of mutations in each individual relative to the mean 
number of mutations in the population at any given time. 
In particular, we can examine the variance in the number 
of mutations within the population, which is defined by 



(14) 



Due to the presence of the k terms within this defintion, 
cr^ is not just a simple linear function of the class sizes 
/(fc), and the rate of change of the average variance {V) 
cannot be written as a function of the average class sizes 
{f{k)) alone. Nevertheless, we can use the stochastic dy- 
namics in Eq. ( 10 1 to show that the differential equation 



for (cr^) does close on itself, and we find that 



dt 



= U - 



N 



(15) 



Again, assuming that all individuals start out with zero 
mutations at time t — Q, this equation yields the simple 
solution 



{am=Nu{l-e-''^) 




iit^N, 
if t > iV. 



(16) 



Thus, we see that the variance attains an equilibrium 
value (cr^) = NU, as expected from our coalescent argu- 
ments, and it does so on the coalescent timescale T2 ^ N . 
For t ^ T2, the population continues to accumulate 
mutations at the same steady-state rate U , but it does 
so with the relatively constant shape dictated by this 
mutation-drift balance. On the other hand, for t <^ T2 
the average variance is essentially given by the determin- 
istic estimate Ut obtained from Eq. (12). We argued 



earlier that this average distribution becomes unreliable 
when the uncertainty in the location of the mean be- 
comes comparable to the width of a typical distribution. 
Our Langevin framework allows us to make this argu- 
ment more explicit, since we can directly show that the 




360 400 440 

# mutations, /c 

FIG. 2 The distribution of the number of neutral mutations 
within the population after t = 8N generations when NU = 
50. The colored bars denote the results of two independent 
realizations of the stochastic dynamics in Eq. (10 1, and the 



dashed line is proportional to the average profile {fk{t)} from 
Eq. (121. [The vertical scale has been adjusted to improve 
visibi ity, and in reality all three distributions have the same 
total area.l 



variance in k obeys the differential equation 
aVar(fc) 1 



dt 



N 



and hence 



Var(fc) ^Ut- NU [l- e"*/^) 



ut ( ± 

2 \N 



if t < iV 
NU{jf) iit:$>N 



(17) 

(18) 
(19) 



Thus, when t ^ N, the uncertainty in k is on the order 
of the variance /12 within a typical population, and the 
width of the average distribution {f{k,t)) will start to 
be dominated by the uncertainty in the mean. On much 
longer timescales, the distribution of mutations within 
a typical population will have a relatively tight width 
(cr^) — NU and a mean k which moves deterministically 
towards higher mutation number at rate U, but which 
diffuses around this average position with diffusion con- 
stant U. 



A. Higher moments and correlations 

Of course, the substitution rate and the steady-state 
variance (cr^) can be calculated by other means, without 
the need for the complicated machinery of the Langevin 



equation in Eq. ( 10 ). The real utility of this approach is 



that it allows us to calculate higher moments of the distri- 
bution of mutations that are inaccessible by these other 
methods. Motivated by our discussion of the variance 
in mutation number, we consider the family of central 
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moments, which are defined by 



(20) 



for m > 0. We note that by definition, Mq = 1 and Mi ~ 
0, while M2 is simply the variance (cr^) discussed above. 
In Appendix C, we show that these central moments are 
governed by the compact equation 



5M„ 




Me - mM^ 

{k-wnk)-Y.[k-kr-^f{k)\ 

k I 

(21) 



where we have rescaled time by r = t/N . Unfortunately, 
the nonlinear term on the right implies that these equa- 
tions do not close when m > 4. We can only obtain a 
closed system by considering more complicated products 
of the form 



m 1 , . . . , m ,7 



n 



L k 



{k-kr^f{k,t) 



which have the general property that 



(22) 



(23) 



The equations of motion for the first few moments m < 4 
are relatively simple, and were first analyzed by |Higgs| 
and Woodcock (1995). In our present notation, they 



showed that 
dM2 

dMs 
Ot 
dMi 

dM2a 
dr 



NU - M2 
NU - 3M3 
NU + 6NUM2 



6M2 



AMa 



2NUM2 - 3M2 2 + M4 



(24) 
(25) 
(26) 
(27) 



although one can technically include the fifth order mo- 
ments 



dr 

dM2^ 

dr 



NU + lONU (M2 + M3) -I- 10M2,3 



NU {M2 + M3) - 8M2,3 + M5 



5M5 
(28) 

(29) 



before triple products of the form Mmi,m2,m3 start to ap- 
pear. This system of first-order linear differential equa- 
tions can be solved using standard Laplace transform 
methods, but in this case we are primarily interested in 
the steady-state behavior as t — > c». In this limit the 



time derivatives on the left-hand side vanish, and the re- 
sulting algebraic system can be easily solved to obtain 



M3 = 



M2 ^ NU 
NU 

M4 - 3M2,2 = -2{NUf 

lA{NUf + NU 



M2,2 = 



6 



(30) 

(31) 
(32) 

(33) 



The first three of these quantities coincide with the first 
few cumulants of /(fc, t), which (through the related skew 
and kurtosis) are often used to characterize the shape of 
a distribution. However, since these are random distri- 
butions, we must be careful about the averaging process 
that we use to compute these characteristic quantities. 
For example, the excess kurtosis — which is often used 
to measure the "peakedness" of the distribution and its 
deviations from normality — could conceivably be calcu- 
lated using any one of the four averages 



Ma 



3M| 



Mi - 3Af2,2 M4 - 3M2,2 



M- 



2,2 



fc)V(fe)] 



-3[E.(fc-fc)V(fc)]' 

fc)2/(fc)]' 



(34) 



(35) 



which each give slightly different results, even in the limit 
that NU — ?> 00. One could argue that this last definition 
is closest to the standard usage of the excess kurtosis, but 
it is unfortunately the most difficult to calculate. Calcu- 
lating the average of the inverse of a random variable 
typically requires us to first calculate all of its higher- 
order moments, which requires additional equations be- 
yond Eqs. ( [30p3l ). 

The second order product M2_2 in Eq. (33) can be 



used to calculate the variance in the width af. between 
independent populations through the relation 



Var(cr^,) = M2,2 - M^ 



8{NU)^ + NU 
6 



(36) 



This shows that the standard deviation in the typical 
variance at long times and large NU is approximately 



Std{al) - 1.15(iVC/), 



(37) 



which remains larger than its expected value (cr^.) — NU 
even as NU — > co. Thus, the variance in the number of 
mutations within the population is not self-averaging in 
the sense that does not "settle-down" to some fixed 
value in large populations. (One might naively expect 
this from the central limit theorem if the number of mu- 
tations in each individual was independent.) Instead, 
the typical spread in the number of mutations undergoes 
large fluctuations as the population continues to acquire 
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new mutations. The typical lifetime of these fluctuations 
can be measured from the autocorrelation function 

G(Ar)= \im [{aUr)4ir + At)) - {aUr)) {aUr + At))] 



Var(a^)e 



-At 



(38) 



which implies that these correlations decay in a simple 
manner on the coalescent timescale T2 N. 

Continuing the system in Eqs. ( 24p9 ) to central mo- 
ments with m > 5 starts to become complicated, since 
the moment equations for start to involve more and 



more of the generalized products in Eq. (22). Neverthe- 



less, the algebraic structure of these moment equations 
(which is derived in Appendix C) is such that the re- 
sulting system can be solved exactly in a straightforward 
manner with the help of a computer. The most impor- 
tant property of these moment equations is that the gen- 
eralized products with ^ rrij = m depend only on those 
products with total order less than or equal to to. Thus, 
at any given order we have a finite system of linear equa- 
tions to solve. We can calculate these moments in an 
iterative manner. Given values for the generalized prod- 
ucts at order < to, we can calculate the moments at order 
TO -|- 1 by solving the matrix equation 



M, 



m+l 



b{NU,Mi,...,M^). 



(39) 



where Mm is the collection of generalized products with 
order ^ mj — to, is a matrix of constants (indepen- 
dent of NU or any of the moments), and 6 is a vector- 
valued function of NU and the lower-order moments. 
The entries of the matrix A™, whose size is given by the 
number of generalized products at order to, can be de- 
termined directly by inspection from the system of equa- 
tions in Appendix C. The resulting matrix must only be 
inverted once for each to, and then the analytical solu- 
tions for the various moments can be obtained by simple 
matrix multiplication. An implementation of this iter- 
ative algorithm in Python is available from the authors 
upon request. 

One can in principle use this algorithm to calculate 
the moments for arbitrary to, which will be in the form 
of some polynomial in NU similar to what we found for 
the first few moments in Eqs. ( 30p3 ). Typically, we 
will be interested in the limiting behavior for large NU, 
which we can access most easily by defining the rescaled 
moments 



M„ 



(Vivc/)5^™^- 



(40) 



In the limit that NU — > 00, the equations for the rescaled 
moments become independent of NU and so Mm can de- 
pend only on m. These rescaled moments can be calcu- 
lated from the same iterative scheme outlined above, and 
the results for the first thirty moments are shown in Fig. 
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FIG. 3 The central moments Mm as a function of m in the 
limit that NU — >■ 00. Symbols denote the exact numerical 
results calculated using the iterative scheme outlined in the 
text for m = 2, . . . , 30. The red line denotes the approximate 
scahng form, Mm ~ mM^NU)"^^^ /■mF where p « 1.93. 



[3] For large to, these moments obey the approximate 
scaling relation 



to! 

mP 



{Nuy 



V2 



(41) 



where the exponent p w 1.93 can be extracted from the 
plot in Fig. [3] Although these central moments grow 
more quickly than the corresponding central moments 
of a Gaussian distribution, they grow sufficiently slowly 
that the centered distribution of mutations remains a 
light-tailed distribution. This is in contrast to the analo- 
gous neutral limit for Fisher-KPP waves, where the prop- 
agating front displays a power-law shape due to the pe- 



riodic formation of smaller waves at the tip ( Hallatschek 



and Korolev 2009). However, many of these technical 
details are beyond the scope of the present paper. The 
main point of this discussion for to > 4 is simply that all 
of the central moments of this neutral distribution are 
exactly solvable, as long as one is willing to devote the 
time and computing power necessary to implement the 
iterative scheme described above. 



III. PERTURBATION THEORY FOR SELECTED 
MUTATIONS 

Although some properties of the distribution of neutral 
mutations are interesting in their own right, previously 
developed methods like the neutral coalescent offer a sim- 
pler and more direct way to quantify the genetic diversity 
at the sequence level. Instead, the true utility of the ex- 
actly solvable model in the previous section lies in the 
fact that — unlike these earlier methods — our fitness- 
class description can be easily generalized to calculate 
the corrections that arise when selection is present. As 
an example, suppose the neutral mutations in the previ- 
ous section now have a constant fitness effect s. Selection 
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on these mutations leads to an additional term +s{k — k) 
on the right-hand side of the stochastic dynamics in Eq. 
(10 1. Under these modified dynamics, the substitution 



rate R = d{k) /dt for these mutations is now given by 



R = U + sA'h 



(42) 



which now depends on the variance in k in addition to 
the neutral accumulation rate U . The variance M2 will 
in turn depend upon the skew M3, and so on in the in- 
finite hierarchy of moment equations mentioned earlier. 
However, if the strength of selection is weak and Ns <C 1, 
the contribution to the variance from the M3 term will 
be small, and the variance will be approximately equal 
to the neutral result M2 = NU . Thus, by neglecting 
the selection term in the calculation of M2 , we obtain an 
approximate expression for the substitution rate 



Rk.U(1 + Ns) 



(43) 



valid in the limit that Ns — where the Ns term is a 
small perturbative correction to the neutral result R — U. 
In this way, the exact results for the neutral wave can 
serve as a basis for a perturbative analysis of the ef- 
fects of selection in increasing powers of Ns. We note 
that this limit is quite distinct from the weak-selection 



regime analyzed by Kimura ( 1955 ) for a single locus. 



and it is equally removed from the quasi-linked, weak- 
selection regime studied by Nagylaki (1993). By using 



the full multi-locus neutral limit as our starting point, 
we are able to analyze the selective corrections to all the 
sites simultaneously, while fully preserving the effects of 
linkage and interference found in the neutral case. Our 
analysis, which is anticipated to some extent in |Higgs| 
and Woodcock (1995), is more similar to the perturba- 



tive treatment of noisy Fisher waves in Hallatschek and 



Korolev (2009) 



In order for our perturbative scheme to apply to the 
more general populations described by Eq. ([6]), we must 
make some small modifications to our treatment of the 
neutral dynamics in order to properly account for distri- 
butions of fitness effects. In our analysis above, it was 
most natural to divide the population into fitness classes 
according to the discrete number of mutations k in each 
individual. Now it will be convenient to consider k to be 
a continuous variable, which is related to the fitness 



X = sk 



(44) 



through an overall constant of proportionality s that pa- 
rameterizes the strength of selection. The distribution 
of fitness effects p{s) can then be alternatively viewed as 
a distribution of "/c-effects," which we denote by p{Ak). 
With these definitions, our model in Eq. ^ can be ex- 
plicitly rewritten in "fc-space" as 



dm 

dr 



Ns{k - k)f{k) + NU d{Ak) p{Ak) [f{k - Afc) - f{k)] + / dk' [S{k' - k) - f{k)] \/ .f{k')r){k') . (45) 



where it is now clear that the nonlinear selection term is 
a perturbative correction with a well-defined limit when 
Ns = 0. Proceeding along the lines of the previous sec- 
tion, this stochastic differential equation yields an analo- 
gous set of differential equations for the central moments 
and the generalized moment products (see Appendix C). 



The first few orders [which were obtained by Higgs and 
Woodcock] (|1995[) and|Etheridge et alT](|2007[) for a similar 



model] are given by 



dM2 
dt 

dM2,2 

dr 



NU{Ak^) - M2 + NsMs (46) 

NU{Ak^) - 3Af3 -f Ns{pi - 3p2) (47) 

NU{Ak^) + 6NU{Ak^)M2 

+ QM2,2 - 4M4 + 0{Ns) (48) 

2NU{Ak^)M2 + Mi - 3M2,2 + 0{Ns) (49) 



I 

ment of the distribution of fitness effects. Thus, in the 
presence of selection the moments at order m now include 
terms that depend on the moments at order m + l. The 
resulting system of equations cannot be solved at any 
fixed order because it always depends on the moments at 
a still-higher order. 

While this lack of closure among the moment equations 
makes it difficult to obtain a closed-form solution for any 
particular moment, it naturally suggests a perturbative 
approach similar to the R ~ C/(l -I- Ns) approximation 
above. In particular, we assume that in the limit Ns — > 0, 
each of the central moments Mm admits an asymptotic 
expansion of the form 



M„ 



00 



M^\Nsy (Ns^O) 



(50) 



where Mm is a numerical coefficient that depends only 
on NU. A similar expansion is assumed for the general- 
ized products in Eq. (22 ). Substituting these expressions 



where (Afc™) — J d{Ak) (Ak)'^ p(Ak) denotes mth mo- into the moment equations and grouping terms in powers 
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of Ns, we can obtain a generalized system of equations 
for the coefficients Mm\ which is listed in Appendix C. 
The important feature of these equations is that unlike 
the case for the full moments Aim, the equations for the 
coefficients Mm close for a given order j and m. The al- 
gebraic properties of these equations are again sufhciently 
simple that they only lead to a slightly more complicated 



version of Eq. ( 39 ) 



^m+l 



M. 



m+l 



(51) 



where A and b are the same as in the neutral case, and 
c is a vector- valued function of the moments at the next- 
lowest order in j. Thus, the iterative procedure outlined 
for the neutral case can be easily generalized to calculate 
the coefficients Mm^ order-by-order for arbitrary m and 
j. An implementation written in Python is available from 
the authors upon request. 

As an alternative to this explicit order-by-order cal- 
culation, we note that the coefficients in the asymptotic 
expansion in Eq. (50) are unique (Hinch, 1991), so we 



can also obtain these coefficients simply by dropping the 
selection term in the moment hierarchy at the desired or- 
der, solving the resulting finite system of equations, and 
then reexpanding the solution in powers of Ns. Applying 
this procedure to the moment equations in Eqs. ( 46p9 ), 
we obtain the first few corrections to the population vari- 
ance in fitness 



M2 = NU{Ak^ 



2[A^C/(Afc^)]2 



(52) 



(A^s)2 + 0{Nsf 



and hence the rate of adaptation v = ^^^jp- is given by 



Us 



{Ak)+Ns{Ak'^ 



{Ns)^{Ak^ 



2[NU{Ak^){Nsf][Ns{Ak^ 



0{Ns)'^ 



The first three terms in this expansion are exactly what 
one would obtain by assuming that the individual sites 
evolve independently, in which case the rate of fitness in- 
crease would simply be the sum of the single-locus adap- 
tation rates that can be calculated from Eq. The 
fourth term in this expansion represents a fundamentally 
new correction that arises solely from the accumulation 
of selected mutations over many different sites. This term 
is proportional to the variance in fitness within the popu- 
lation, so we see that the rate of adaptation is reduced in 
populations with a larger number of selected mutations 
(see Fig. |4]) because many of these mutations will be 
lost to clonal interference before they can fix. Similarly, 




FIG. 4 The scaled substitution rate i? as a function of the 
zeroth-order fitness variance (Naf = NU{Nsf. Symbols 
denote the results of forward-time simulations for NU — 10 
(black), NU = 50 (blue), and NU = 300 (red), and the solid 
lines give the predictions from Eq. (53 1. Upper triangles de- 



note populations with a purely beneficial distribution of fit- 
ness effects p(Afc) = S{Ak — 1), while the lower triangles give 
the corresponding deleterious distribution p(Ak) = 5{Ak + l). 
Our predictions start to diverge near Na = 1, when we expect 
our perturbation expansion to break down. 



in populations which are accumulating deleterious muta- 
tions due to Muller's ratchet, this interference term leads 
to an increase in the rate of Muller's ratchet (again, see 
Fig. |4| due to the increased importance of fluctuations 
in the high-fitness nose of the population. 

In addition to the mean rate of adaptation, we can use 
this perturbative scheme to calculate the fluctuations in 
the rate of adaptation as well, which has so far been 
accessible only through heuristic arguments. Using the 
dynamics in Eq. ( [45^ , we can construct similar moment 
hierarchy for Var(fc) and its relatives, and the first few 
orders are given by 



9Var(fc) 
_dT 
dCov(k,M2) 
dr 



(53) dCov{k, M3) 



dr 



= M2 + 2NsCov{k, M2) (54) 

= -Cov(fc,M2) + Afg 

+ Ns [Var(M2) + Cov(fc, M3)] (55) 

= -3Cov(^, M3) + M4 - 3Af2,2 + 0{Ns) 

(56) 



Thus, at long times i — >■ 00, the mean fitness fluctuates 
diffusively 



Var(fc) - 2Dt {t 
with diffusion constant 
U{Ak^ 



00) , 



D 



1 + i^Ns - 

(Ak^) 3 



NU{Ak'^){Ns 



(57) 



(58) 



+ OiNsf 
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Again, we see that interference between the various lin- 
eages leads to a reduction in the diffusivity of the mean 
fitness that is proportional to the variance in fitness 
within the population. This leads to a novel prediction in 
the case of the dynamic mutation-selection balance dis- 



cussed in Goyal et al. (2012), where a balance between 



beneficial and deleterious substitutions halts any global 
fitness change in the population. The results in Eq. (58) 



show that even though the average fitness change in these 
populations is zero, we still expect the distribution to 
wander diffusively around this fixed point, with diffusion 
constant 



U 
'2 



1 4- Nsil - 2ec) - 



NUiNsf 



0(Nsf 
(59) 



in the limit that Ns — ^ 0, where Cc is the critical ratio 
of beneficial to deleterious mutations. This scenario is 
discussed in more detail in Appendix A. 

IV. APPLICATION TO SEQUENCE EVOLUTION 

Our analysis so far has focused on population-wide 
properties of the fitness distribution and important as- 
pects of the evolutionary dynamics such as the rate of 
adaptation. In the introduction however, we were pri- 



marily interested in predicting evolutionary fates and di- 
versity at the sequence level, which we have so far ne- 
glected. In the present section, we will demonstrate how 
this fitness-class description can also provide a window 
into the evolutionary dynamics of sequences within a par- 
ticular population. 



A. Fate of a focal lineage 

As an example, we consider the fate of some clonal lin- 
eage within the population which has an initial frequency 
p and fitness Xq = sfco a-t time t = Q. This lineage could 
consist of a group of individuals that share a common 
point mutation at a particular site or it could alterna- 
tively represent some fluorescently labeled "marker pop- 
ulation" whose dynamics we wish to follow. Now in addi- 
tion to tracking the fitness of each individual in the popu- 
lation, we must also keep track of whether these individ- 
uals are descended from this particular lineage or from 
the background population. We can accomplish this by 
dividing the occupation densities f{k, t) into two classes 
fi{k,t) and f(){k,t) which contain individuals descended 
from the focal lineage and the background population re- 
spectively. This division requires a small modification to 



our original dynamics in Eq. ( 45 ) , 



dfijk) 
dr 



Ns{k - k)U{k) + NU I d{Ak)p{Ak) [f,{k - Ak) - f{k)] + J2 i^ik' - mo " h{k)] ^/fMv^ik) , (60) 



r 



but otherwise the stochastic dynamics are essentially the 
same. Now in addition to the population-wide central 
moments , we can also introduce a new set of central 
moments 



dk{k~ k)"'fi{k,t) 



(61) 



that are specific to the focal lineage. The lowest-order 
moment Fo{t) represents the average total fraction of the 
population that is descended from the focal lineage (e.g., 
the frequency of a particular SNP within the popula- 
tion). At long times, one of two things can happen: ei- 
ther the focal lineage is outcompeted by the background 
and / dkfi{k,t) = 0, or its descendants take over the 
population and J dk fi{k,t) = 1. Thus, the probability 
of fixation is given by 



pnx = lim Fo{t) . 

t^OO 



(62) 



Using the dynamics in Eq. (|60|) and the rules of our 
stochastic calculus, we obtain an additional moment hier- 



archy for the focal moments Fm- As we show in Appendix 
D, this requires us to consider generalized moments of the 
form 




dk{k-k)"'^fi{k,t) 



J2 J dkik-kr-Mk,t) 



(63) 



where the m indices denote moments specific to the fo- 
cal lineage and the n indices denote the population- wide 
moments Mm discussed earlier. In this notation, the first 
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few orders of the moment hierarchy for _Fo are given by 
dFo 



dr 
dFi 

Ih 

dr 

dFo,2 
dr 

m 

dr 

dFo,3 
dr 

dF^,2 
dr 



NsFi (64) 

-^^1 + Ns {F2 - Fo;2) (65) 

NU{Ak^}Fo + Fo.,2 - 2F2 + Ns {F3 ~ 2Fi,2) 

(66) 

NU{Ak''}Fo + F2~ 2Fo,2 + Ns {Fi,2 + Fq.s) 

(67) 

NU{Ak^)Fo + 3NU{Ak^)Fi 
- 3F3 + 3Fi,2 + 0{Ns) (68) 

NU{Ak'^}Fo + F3- 4Fo;3 - 3Fi;2 + 0{Ns) 

(69) 

NU{Ak^)Fi +F3- Fo.,3 - iFi,2 + 0{Ns) 

(70) 



We are interested in the hmiting behavior at long times, 
which is most easily obtained via the Laplace transformed 
moments 

F^.niz)^ J e-'^Frn-M{T)dT, (71) 

which satisfy the identity 

Fo(T)^Fo(0)+iVs-Fi(2 = l/r), (72) 

as T — > 00. Taking the Laplace transform of the first 
few equations in the moment hierarchy and truncating 
the selection terms at order 0{Ns)'^, we can immediately 
conclude that 

(Ns)^ 

Pfix = FoiO) + NsF,{0) + [F2(0) - Fo;2(0)] 



(Nsf 



[Fi.2(0) + NU{Ak^)] + 0{Ns)^ 



(73) 



The initial conditions for the various moments are given 
by 



Fr,iO)=p{l-p)"^[ko-{k)^^] 



M„(0) = ^^„,(0) + (1 - P) E ( 7 ) - (^)^s] 



£=0 



X {Mm-e)i 



(75) 



where 



(fc)bg = 5]fc/o(fc,0), (76) 

fe 

(M™)bg - ^[fc - (k%,rfo{k,0) (77) 



denote the mean fitness and the central moments of the 
background popuation, respectively. For a lineage cre- 
ated by a spontaneous mutation, the initial frequency is 
just p — 1/N. The initial fitness sk^ is obtained as a ran- 
dom draw from the population fitness distribution plus 
the fitness effect sAk of the mutation. After averaging 
over the possible fitness backgrounds that this mutation 
could have arisen on, we find that 

[ko - (kh,r ^ E 7 (^fc)'(^^"-^)bg (78) 

Averaging over the background moments {Mm)hg simply 
yields the steady-state moments that we derived in the 
previous section. Thus, in the limit of large population 
sizes we have 



1 ™ / \ 



M™(0) = M„ 



(79) 
(80) 



and the fixation probability for a spontaneous mutation 
with effect Afc is given by 



Pfix(Afc) = 



1 + NsAk 



(NsAk)"^ 



2[NU{Ak^) (iVs)2] [NsAk] 



0{Ns)^ 
(81) 



Again, we see that the first three terms are identical to 
the single-locus result in Eq. ([2]). We obtain the lowest- 
order "interference correction" in the fourth term, which 
reduces the probability of fixation of a benficial muta- 
tion in a way that is directly proportional to the average 
variance in fitness within the population at the time of 
the mutation. For a deleterious mutation, this correction 
term actually increases the fixation probability because 
the mutant could find itself on an anomalously fit back- 
ground (thus mitigating some of the effect of the deleteri- 
ous mutation). We recover the standard neutral fixation 
probability pgx = 1/iV when Afc = 0. 

Noting the similarity between the fixation probability 
in Eq. (81 1 and the rate of adaptation in Eq. (53 1, we 



(74) 

see that the relation 



J NU ■ sAk -pfi^iAk) ■ p{Ak)d{Ak) (82) 

holds at least through the first few orders in Ns. This re- 
lation has formed the basis for several studies of the evo- 



lutionary dynamics under strong selection (Good et al 



2012 Hallatschek 2011 Neher et al. 2010), with the ad- 



ditional "mean-field" ansatz 



Pfix(Afc) « / dx {fix - sAk)) ■ pMifix))) . (83) 
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Here, pfi^{x\{f{x))) denotes the fixation probability for 
a new mutation with relative fitness a;, given that the 
centered fitness distribution of the rest of the population 
is (/(x)). However, we see that in the present regime, 
this mean- field ansatz is not quite correct. Instead, we 
require the slightly more complicated average 

pUAk) ^ Jdx {fix - sAk) ■ pfi,(x|/(x))) , (84) 

which jointly considers the fluctuations in the fitness 
background of the mutant as well as the fluctuations in 
the fitnesses of its competing lineages. Like the other cor- 
related quantities we have considered in the present work, 
this average more or less decouples in the strong selection 
limit Ns oo, and we recover the "mean-field" ansatz 
in Eq. ( 83 1 . But in the weak selection regime considered 
here, these correlated fiuctuations start to become more 



important, and Eq. (84) is required in order to correctly 



account for the population-level dynamics. 



B. Diversity at a focal site 

In addition to predicting the ultimate fate of a se- 
quence, these focal lineage dynamics can also be used 
to predict the average heterozygosity at a particular site 
along the genome and therefore the overall levels of se- 
quence diversity in the population. We consider a par- 
ticular site within the genome with a per-site mutation 
rate /i and scaled fitness effect Ak. This site will be poly- 
morphic in a randomly sampled pair of individuals if and 
only if (1) this site mutated at some time t in the past 
and (2) exactly one member of the pair was drawn from 
the mutant lineage, which has size J dk fi{k,t) in the 
present. After averaging overall possible mutation times 
(and taking note of the fact that the backward-time mu- 
tation process is Poisson), we find that 



2(^1 dkhik)^ j dkh{k) 



(85) 



In the infinite-sites limit where N fi — >■ 0, this yields the 
relation 

/•oo 

TT = 2N^i i NH{T)dT (86) 
where we have defined the heterozygosity function 



H{t)^^I^J dkhik,t)j l^l- J dkf,{k,t)^ 

= Fo{t) - Fo,oir) (87) 



Again, we can use the dynamics in Eq. (60) to construct 



gosity, and the first few orders are given by 
dH 



dr 

a(fi~2Fo,i) 
Ot 

dFoa 
dr 

dr 

dr 



-H + Ns{Fi-2Foa) 

-3 (^^1 - 2^^0,1) + Ns {F2 - 2Fo,2 - i^0;2) 

+ iVs(2Fo,o;2-2Fi,i) (89) 



NU {Ak^) {Fa - H) + F2 + F[ 



0,0:2 



- 3i^o,2 - 2Fi4 + 0{Ns) 

NU{Ak^){Fn-H) + Fo,2 

- 4Fo,o;2 + 2^0.2 + 0{Ns) 

— 3i^l ,1 + F2 — 2Fo 2 + Fo n-2 



(90) 

(91) 

0{Ns) 
(92) 



Truncating the hierarchy and solving the Laplace trans- 
formed equations, we obtain 



2N^i 



1 



NsAk 2NU{Ak'^){Nsy 



-0{Nsf 
(93) 



a similar hierarchy of moment equations for the heterozy- 



Again, the first two terms in this expansion are equivalent 
to the single- locus results in Eq. ([3|. At third order in 
Ns, we obtain the lowest-order correction due to interfer- 
ence at neighboring sites, which reduces the diversity at a 
particular site according to the variance in fitness within 
the population. This reduction in diversity is seen even 
at putatively neutral or synonymous sites that are not 
otherwise selected on their own. 



V. DISCUSSION 

Although natural selection acts on the genome as a 
whole, the effects of selection at a large number of linked 
sites are only beginning to be characterized. Recent stud- 
ies have identified the distribution of fitnesses within the 
population as a key mediator for these effects, but our 
understanding of this distribution remains limited to a 
few special cases where the strength of selection is strong 
and genetic drift is correspondingly weak. Here, we have 
introduced a general method for analyzing the effects of 
selection at many linked loci, which incorporates linkage 
and drift exactly while treating the global strength of 
selection as a perturbative correction. This framework 
allows us to investigate the stochastic behavior of the fit- 
ness distribution in a regime where the fluctuations due 
to drift are especially strong, and it fills an important 
gap in our theoretical understanding of linked selection 
in the approach to the neutral limit. 

As a quantitative theory, the present framework suffers 
from several shortcomings that may limit its direct appli- 
cability to data from natural populations. Our perturba- 
tive approach gives predictions for various quantities in 
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terms of an asymptotic series in the limit that Ns — >■ 0, 
which means that for a fixed number of terms in this se- 
ries, the resulting formulae will only become valid once 
Ns is sufficiently small. Moreover, these aysmptotic ex- 
pressions are nonuniform as a function of the mutation 
rate NU, and in general for larger mutation rates we 
require ever smaller values of Ns for our expressions to 
remain accurate. In reality, these asymptotic series could 
be more accurately described as an expansion in powers 
of the typical fitness variance Na Nsy/NU, valid in 
the hmit that iVcr < 1. 

It remains an open question exactly what values of 
Na are relevant for natural populations. Indeed, one 
of the major motivating factors behind this quantitative 
approach to linked selection is to eventually use these 
theoretical tools to infer Na directly from DNA polymor- 
phism data in sampled populations. Because the vast ma- 
jority of new mutations are thought to be either neutral 
or weakly deleterious, there has been speculation that 
evolution at the sequence level is dominated by these 
"nearly- neutral" mutations with Ns < 1 (Ohta 19921, 
although it is unclear whether these selection coefficients 
lead to an Na that is sufficiently small for our results 
to apply. In principle, the range of applicability of our 
expressions can be improved by including more terms 
in the expansion, but there is typically an upper limit 
to the radius of convergence that can be achieved this 



fects of selection at neighboring sites (Hill and Robert 



way (Hinch 1991). However, because we have outlined 
a method for calculating successively higher-order terms 
programatically, series improvement methods could po- 
tentially be used to extend the radius of convergence, 
even for Na > 1 (Dyke 1974 Song and Steinriicken 



2012). This constitutes an interesting direction for fu- 
ture work. 

While the experimental applicability of these perturba- 
tion methods may be limited, they nevertheless provide 
a valuable qualitative window into the effects of selec- 
tion at many linked sites, and the exact nature of the 
selective corrections allows us to address a number of 
longstanding assumptions in the population genetics lit- 
erature. Chief among these is the independent-sites as- 
sumption that is frequently used to model selection at 
individual sites along the genome. Somewhat surpris- 
ingly, we have demonstrated here that this assumption 
is valid not only in the purely neutral case, but also fre- 
quently through the first-order selective correction. At 
higher-orders, however, we start to obtain terms that de- 
pend on NU, and more generally, the variance in fitness 
maintained within the population. These terms repre- 
sent corrections that arise solely from the interactions 
between the selected sites, and cannot be predicted from 
any single-locus theory. 

Of course, the standard assumption is not that linked 
sites evolve in this strictly independent fashion, but that 
they evolve independently at a reduced effective popula- 
tion size Ne, which is supposed to encapsulate the ef- 



son 



1966 ). However, several recent studies have begun to 



challenge this assumption ( 


Comeron and Kreitman 


2002 


Good and Desai 


2012 


Santiago and Caballero 


199S 


),of- 



ten on the grounds that a different A^e must be defined 
for every quantity we wish to predict. This shortcoming 
is apparent from our present analysis as well, and our 
analytical corrections provide an explicit demonstration. 

The effective population size is most commonly mea- 
sured from the diversity at putatively neutral or synony- 
mous sites. To lowest order in Ns, our analysis of the 
pairwise heterozygosity yields a corresponding effective 
population size 



A^, = A^ 



1 - 



2NU{Ns) 



(94) 



which is reduced in the presence of selection as expected. 
To lowest order, this same A'e correctly predicts the re- 
duction of heterozygosity at selected sites as well. Al- 
ternatively, we could define A^e by measuring the diver- 
gence (i.e., frequency of nucleotide substitutions) at var- 
ious sites under selection, which depends on the fixation 
probability of a new mutant. To lowest order in Ns, this 
yields an effective population size 



A^, = A 



2NU{Ns 



(95) 



which is also reduced by selection at linked sites, but 
at slightly faster rate than in Eq. (94). Thus, we re- 
quire a different A^e to account for linkage depending 
upon whether we wish to predict tt, p^x, or some other 
sequence-based statistic. While the effective population 
size can still be used in the technical sense on a per- 
quantity basis, these results imply that its predictive or 
explanatory power is greatly reduced, and that the effects 
of linked selection are more complicated than a simple in- 
crease in genetic drift would suggest. 

In this way, the selective corrections obtained here can 
be extremely useful from a model-building standpoint, 
even when we wish to ultimately apply these models in 
regions where the perturbative approach breaks down. 
These corrections are straightforward to calculate for 
any quantity with a well-defined neutral limit, and be- 
cause they are exact, any other model describing weakly 
selected mutations should recover these expressions as 
Ns — >■ 0. Many aspects of natural selection at the se- 
quence level remain poorly understood, and exact re- 
sults are few and far between. It is our hope that the 
methods outlined in the present work can be used as a 
stepping-stone to identify and evaluate those approxima- 
tions which will lead to further progress on this important 
problem. 
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Appendix A: Finite site effects 

In the main text, we worked exclusively in the large-genome limit, where the number of sites was so large (and the 
per-site mutation rate so low) that we could focus on an intermediate asymptotic regime where back mutations could 
be neglected, and the mutation rate and distribution of fitness effects was independent of the previous mutations 
within a particular genome. In the present section we now consider what happens when we start to relax these 
assumptions. In particular, we assume that the genome has some finite size L and that the per-site mutation rates 
are now sufficiently large that the scaled product N jj. at each site is finite. For simplicity, we also assume a constant 



fitness effect for mutations at each site. This is similar to the model analyzed in Woodcock and Higgs ( 1996 ) and 



Rouzine et al. (2003). 



In this case, the population can still be partitioned according to the number of mutations A; = 0, . . . , L in each 
individual, but now we must account for the fact that the distribution of fc-effects depends on k in addition to the 
fitness. An individual with k mutations can mutate at another site at rate /i(L — k), in which case fc — > fc + 1. This 
individual can also experience a back- mutation at one of its fc mutated sites at rate /ifc, in which case fc — >■ fc — 1. 
The total rate for one of these two events to happen is of course just U = fiL. Thus, the fitness classes /(fc) evolve 
according to the stochastic dynamics 



dfjk) 
dr 



= Ns{k - k)J{k) + Nfi{L - fc + l)/(fc - 1) + iV/i(fc + l)/(fc + 1) - NfiLfik) 



(Al) 



In constrast to the L — > oo case analyzed in the main text, the behavior of the average profile (/(fc)) in the neutral 
limit no longer degenerate, and we find that 



hm (/(fc,r)> 



L 



(A2) 



This is just a binomial distribution with mean L/2 and variance L/4, which is consistent with the intuition that the 
population at long times consists of individuals with independent and identically distributed mutations along the L 
sites in the genome. Although there is no longer any infinite-width "red-flag" to suggest that fluctuations may play 



an important role here, the absence of any Nn dependence in Eq. (A2| is suspicious, since we would expect that the 
typical width of the distribution should vanish as Nfi — > 0. 

Thus, we are lead to consider the behavior of the mean (fc) and the central moments that we analyzed in the 



infinite-sites model. Using the dynamics in Eq. ( Al I it is straightforward to show that 



d{k) 



= NsNh +Nfi/Y, ['^(^ + - 1) + fc + l/(fc + 1) - Lf{k)] 



= NsAh + NfiL 



1 - 



2k 



(A3) 
(A4) 



so that in the neutral limit the population reaches mutation-reversio n ba lance when the average number of mutations 
in each genome is (fc) = L/2, just like the average profile in Eq. (A2|. We also see that this equilibrium point is 



reached on a timescale teq ^ ^/'^fJ'- For the central moments Af,„, it is straightforward to show that the equations 
become 



Me 



m 
2 

m-2 



M2,m-2 - mM„i + Ns {Mrn+1 ~ TOAf2,m-l) 



i=0 



(A5) 



where the last two terms arise from the fc-dependent mutation rates. The first few orders of the moment hierarchy 
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are given by 



dM2 

dM2,2 

dr 



NfiL - [1 + AN^i] M2 + NsMs 

Nfi [L - 2(k)] - 3 [1 + 2N^] M3 + Ns [M4 - SMz^a] 
NfiL + [6NfiL - 8N^i] M2 + 6A//2,2 - 4 [1 + 2Nfi] M4 
2N^iLM2 - [3 + 8Nfj] A/2,2 + ^4 + 0{Ns) 



0{Ns) 



Thus, in the neutral hmit, the actual variance in k within the population is given by 



M2 = 



1 + 47V/i 



(A6) 
(A7) 
(A8) 
(A9) 

(AlO) 



which only approaches the deterministic value of L/4 when N ^ ^ 1. For smaller per-site mutation rates, the width of 
the fitness distribution can be much smaller than this, and for N ^ ^ 1 it approaches the infinite-sites limit M2 = NfiL 
that we analyzed in the main text. We can apply our perturbative approach to this moment hierarchy as well, which 
shows that when the mutants are weakly beneficial the equilibrium value of (k) is given by 



(k) 



L 

2 



Ns 



2Ns 



1 + ANn 



Ns 



N^L 



AN + 16{N fi)' 



1 + ANfiJ \3+ fN^l+f{N^x)■^ + ^{Nfi)3 



0{NsY 



(All) 



Following Woodcock and Higgs (1996) and Goyal et al. (2012), we define ec to be the fraction of all the possible 

(A12) 



mutations that are beneficial at this equilibrium point, or 



= 1 - 



This allows us to rewrite Eq. (All) in terms of this critical ratio as 

2 



Ns 



2Ns 



Ns 



NfiL 



ANn+16{N^iy 



fN,- 



0{Ns)'^ 



1 + ANfi 3 V 1 + 4:N^ 
In the limit that N/j, — >■ and L — > cx) with NU — NfiL fixed, we recover the infinite sites preduction 

{Nsf 2NU{Nsf 



1 - A^s + 



OiNsy 



(A13) 



(A14) 



3 3 

which provides a more accurate expression for the critical fraction as Ns — > compared to the corresponding expression 



Goyal et al. (2012). 



Appendix B: Stochastic Calculus 



In this section, we outline the stochastic (Ito) calculus that is used to derive moment equations from the stochastic 
dynamics in Eqs. ([l]), ([6|, and (10 1. For concretness, we will restrict our attention to the neutral dynamics in Eq. 



(10), but these results will apply more generally. Let <^({/fc}) be some arbitrary function of the fitness classes, fk(t). 



We wish to find an expression for the time-evolution of the mean {4>i{fk})) using the definition 



dt 



~ lim 

St^Q 



mfk{t+st)})-mfk{t)})) 

St 



(Bl) 



The dynamics in Eq. (10 1 are essentially just a shorthand notation for calculating fk{t + St) conditioned on fkit). 
Taking care to note that the drift term in Eq. ( 10 ) is of the Ito form, the dynamics in Eq. ( 10 ) imply that 



fk{t + St) = fkit) + St [Ufik -l,t) + Uf{k, t)] 



Vn 



(B2) 
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The value of (f>{{fk{t + St)}) can then be found by Taylor expansion in powers of St. If is just a singleton 
function (j){{fk}) = fk{t), then 

mfkit + St)})) - + St {Uf{k -l,t) + Uf{k, t)) (B3) 

= {f,it))+St((^) V (B4) 



dt 



dct 



where {dfk/dt)dct is simply the deterministic part of the dynamics in Eq. (10). On the other hand, if 0({//c}) is a 
pairwise product of the form 

0({/fc}) = A-i(i)/fc.(t), (B5) 
then things start to become more complicated. Organizing all the terms in powers of St, we see that 

fkAt + st)fk,{t + St) = fkjk, + St [{ufk,-i - ufkjfk, + Ik, (f//fe,-i - uh,) 



X! ^ •^'=1] [^k2,]2 - fk2] V fjlfj2VjlVj2 



5tO{Tj). (B6) 



The term proportional to v St is linear in 77, so upon averaging this term vanishes. The term proportional to St also 
contains a term involving r/, but this time as a quadratic function rather than a linear function, which yields 

( [^^=1 " ■^'=1] ['^'=2^2 - fk2] \/ fjJn^l3iV32 ) = \ Y1 ['^'=1 J ~ ■^'^1] ~ ■fk2] f] ) (B7) 



\J1J2 



— [Ski,k2 - fk2] fki , (B8) 



where we have used the fact that the rjk are uncorrelated for different k. Thus, taking the average of Eq. (|B6|, we 
obtain 

d{f{k,,t)f{k2,t)) ^^fdf{k2,t) 



dt 



l^{f{k,,t)^f{k2,t)), (B9) 



where we have defined a new operation ★ such that 

f{k,,t)^f{k2,t) = [Sk,,k2- fk2]fk, ■ (BIO) 

This is the generalized product rule of the Ito calculus, which arises from the combination of two ^/ St terms in the 
expansion of the Langevin equation. More complicated functions of the fk can be analyzed recursively with the help 
of the sum and product rules 

— dt — = = \^/ + \^/ ^^'^^ 



and 



where 7k- is defined in terms of the underlying fitness classes fk and satisfies additivity and the distributive property. 
As an example, we have 

k * f{k, t)^Y.^' [f{k, t) * /(fc', t)] (B13) 

k' 

= Y,k'[S{k-k')-f{k)]f{k') (B14) 

k' 

= {k-k)f{k,t), (B15) 
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and 

fe^fc = ^fc[fc7k-/(fc,i)] (B16) 

k 

= Y,k{k~k)f{k,t) (B17) 

k 

^Y.^k~kff{k,t). (B18) 

k 

These rules can be used to rapidly generate equations of motion for the generalized moments Af^ and F^..fi analyzed 
in the text (see Appendices C and D). 



Appendix C: Central Moments 



In this section, we use the rules of the stochastic calculus in Appendix B to derive equations of motion for the mean 
"fitness" 



k = / dkkf{k,t) 



and the central moments 



M„ 



dk{k-k)"'f{k,t) 



(CI) 



(C2) 



Without loss of generality, we will restrict our attention to the full dynamics in Eq. (45), where fc is a continuous 
variable proportional to the absolute fitness. Directly from the Langevin equation, we can see that 



d{k) 



= dkk 



dr 



dk k l^NU J d(Ak) p{Ak) [f{k - Ak) - f{k)] + Ns{k - fc)/(fc)^ 
= NU{Ak) + Ns ■ M2 . 
For the central moments Mm, we can use the product rule in Eq. (B12) to show that 



dr 



dkik^kr'^^^m 
^ ' dt 



dk{k - ky'-'-fik) 



dk\ 



dkik-kr-' [k^fik)] + Q 



dr/ 

dk{k-kY''-^f{k) 



k-kk 



(C3) 

(C4) 
(C5) 



(C6) 



dM„ 



m-2 



1=0 



= iVC/ 51 L {5k^-')M, + M2,™_2 - mAf,„ + Ns (M,„+i - mAfs.m-i) 



In terms of the rescaled moments Mm — Mm/iNU)"^^^ , this can be rewritten in the form 



dM„ 



E 

1=0 



m 
2 + 1 



Ak 



iVcr 



Mm+l ~ mM2^m-l 



Mo 



mM„ 



where Na = NU{Ns)^. Thus, in the limit that NU — ;> 00, this reduces to 
dMm 



dr 



{Ak')Mm^2 



M. 



2,m-2 



mMm + Na 



M, 



rn+1 



mM' 



(C7) 



(C8) 
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which is independent of NU and depends on the distribution of fitness effects only through the second moment (Afc^). 

In order to obtain equations of motion for the generahzed products Mmi,...,mj^ we can again appeal to the product 
rule in Eq. (B12) which shows that (with some abuse of notation) 



dM„ 



This requires us to compute 



dr 



k'kM„ 



dM„ 
~d7 



dk (fc - fc)™[/c ★ f{k, t)] - mMm-lik * k) 



(C9) 

(CIO) 
(Cll) 



and 



M™*M„- / dkidk2{ki-kr{k2-k)[f{k^)i.f{k2)]-mMm-i J dk{k-kY[k^f{k)] 
nMn-i j dk{k-ky"[ki.f{k)] + mn(fc* fc)M„_iM„_i 

,m— l,n— 1 • 



= Mn 



M„ 



(C12) 
(C13) 



The equations of motion for generalized products with more than two terms follow from the product rule in Eq. ( B12[ ) 



Appendix D: Focal Lineage Moments 

In this section, we use the rules of the stochastic calculus in Appendix B to derive equations of motion for the focal 
lineage moments 



F„ 



dk{k^k)"'fi{k,t) 



(Dl) 



discussed in the main text. Starting from the product rule in Eq. (B12), we have 

dFrr, 



dk{k-kr^^^-m 
^ ' dt 



dk{k~k)"'-^fi{k) 



dk 



+ (-m J dfc(fc- fc)™-i [fc ★/!(/£)] + 



dk {k - k)"'-^ fi{k) 



k^ k 



(D2) 



In order to progress further, we must extend the * operation to the case where the fitness classes fi{k,t) are labeled 
according to which lineage they descend from. It is a straightforward matter to show that 



Thus, we have 



f^^{ki,t)-k f,^{k2,t) = [Si,iJ{ki - fca) - fi2{k2)] fiA^i) ■ 
k^h{k,t)^Yl J k'[Mk',t)^,h{k,t)] 

i 
i 

= {k-k)hik,t), 



and we can immediately conclude that 

m-2 



dFra 

dr 



F„i-2-2 - mFrn + Ns {Fm+1 - mFm-1-2) 



(D3) 

(D4) 

(D5) 
(D6) 

(D7) 
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In order to derive the equations of motion for the general products F,%.fi considered in the text, we simply need to 
calculate * products of the form Fm * Af„ and Fm * Fn . Starting from 



ki.Fm = -mFm-iik*k)+ J dk {k ~ k)"'[k * fi{k,t)] (D8) 
= F,n+i - r7iF„i_i;2 , (D9) 

we can easily see show that 

F„ * = *2 (fci - kr{k2 - kr[.fi{ki)*Mk2)] 

i 

-mi^„_i^ y" dk{k-k)'"[ki.f,{k)]-nMrr-i J dk{k ^k)"'\ki. fi{k)] 

+ mnFm^iM,n~i[k*k] (DIO) 

= Fm+n — Fm-n — mFm-Un+l — nFjn+Un-l + 'mnFm~l;n-l,2 (DH) 

and 

Fm^F^ = J dki dk2 {ki -kr(k2 -krUiiki) * fi{k2)] - rnF,n-i J dk {k ~kY\k ^ h{k)] 

-nFn-i j d/c(/c-fc)"[fc*/i(A:)] +mnF,„_ii^„_i[fc*fc] (D12) 

= Fm+n — Fm,n — mFm-l^n+l — nF,n+l.n-l + '77-nFm_i^„_i:2 ■ (D13) 

The equations of motion for generalized products with more than two terms follow from the product rule in Eq. ( B12[ ). 



